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NONPARAMETRIC REGRESSION WITH HOMOGENEOUS GROUP 

TESTING DATA 1 

By Aurore Delaigle and Peter Hall 

University of Melbourne 

We introduce new nonparametric predictors for homogeneous 
pooled data in the context of group testing for rare abnormalities and 
show that they achieve optimal rates of convergence. In particular, 
when the level of pooling is moderate, then despite the cost savings, 
the method enjoys the same convergence rate as in the case of no 
pooling. In the setting of "over-pooling" the convergence rate differs 
from that of an optimal estimator by no more than a logarithmic fac- 
tor. Our approach improves on the random-pooling nonparametric 
predictor, which is currently the only nonparametric method avail- 
able, unless there is no pooling, in which case the two approaches are 
identical. 

1. Introduction. In large screening studies where infection is detected by 
testing a fluid (e.g., blood, urine, water, etc.), data are often pooled in groups 
before the test is carried out, which permits savings in time and money. 
This technique, known as group testing, dates back at least to the Second 
World War, where Dorfman (1943) suggested using it to detect syphilis in US 
soldiers. It has been used in a variety of large screening studies, for example, 
to detect human immunodeficiency virus, or HIV [Gastwirth and Hammick 
(1989)], but pooling is also employed to detect pollution, for example, in 
water or milk; see Nagi and Raggi (1972), Wahed et al. (2006), Lennon 
(2007), Fahey, Ourisson and Degnan (2006). Often in these studies, one or 
several explanatory variables are available, in which case it is generally of 
interest to estimate the conditional probability of infection. This problem 
has received considerable attention in the group testing literature, where 
most suggested techniques are parametric; see, for example, Vansteelandt, 
Goetghebeur and Verstraeten (2000), Bilder and Tebbs (2009) and Chen, 
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Tebbs and Bilder (2009). Related work includes that of Chen and Swallow 
(1990), Gastwirth and Johnson (1994), Hardwick, Page and Stout (1998) 
and Xie (2001). 

Thus, although the original purpose of group testing was merely to iden- 
tify infected individuals more economically, the idea has since been expanded 
extensively to include more general statistical methodology when the data 
have to be gathered through grouping. Our paper contributes in this context, 
developing and describing a particularly effective approach to nonparamet- 
ric regression. Obtaining information in this way can be useful on its own, 
or for planning a subsequent study. 

Recently, Delaigle and Meister (2011) suggested a nonparametric estima- 
tor of the conditional probability of infection. Their method enjoys optimal 
convergence rates when pooling is random, but it is not consistent in the 
case of nonrandom, homogeneous pooling, which can be defined as a setting 
where the covariates of individuals in a group take similar values. In the 
parametric context it is well known that homogeneous grouping improves 
the quality of estimators, but the potential gains of homogeneous grouping 
are even greater in the nonparametric context, where random grouping in 
moderate to large groups can seriously degrade the quality of estimators. 

We demonstrate that, when the data are grouped homogeneously, one can 
construct more accurate nonparametric estimators of the conditional prob- 
ability of infection. We show that these improved estimators enjoy faster, 
and optimal, convergence rates in a variety of contexts. Having reliable es- 
timators of the conditional probability of infection enables more accurate 
identification of vulnerable categories of people, and can lead to subsequent 
studies that can assist individuals who are particularly vulnerable to infec- 
tion. We illustrate the practical performance of our procedure via simulated 
examples and an application to the National Health and Nutrition Examina- 
tion Survey (NHANES) study, a large health and nutrition survey collected 
in the US; see www.cdc.gov/nchs/nhanes.htm for more about the NHANES 
research program. 

2. Model and methodology. 

2.1. Main group testing model. We observe independent and identically 
distributed (i.i.d.) data X±, . . . , Xn, where A" is a covariate observed on each 
of N respective objects (e.g., items or individuals), each of which is subject 
to a potential, relatively rare "abnormality." For example, X could be the 
age or weight of an individual, and the abnormality could be contamination 
by HIV. Let Yi denote the result of a test on the zth object, such as blood or 
urine test. That is, Yi takes the value 1 or according to whether the abnor- 
mality is detected or not, respectively. In large screening studies, where N 
is very large, testing each individual for contamination can be too expensive 
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or take too much time, and to overcome this difficulty, it is common to pool 
data on several individuals before performing the detection test. 

Pooling is performed by partitioning the original dataset X, comprised of 
the values X\, . . . , Xn, into J subsets, or groups, Xi, . . . , Xj, say, where Xj 
is of size rij and n\ + • ■ • + nj = N. We denote the elements of Xj by 
X\j , . . . , X n . j . Each Xij corresponds to an X k , and each X k has a con- 
comitant Yfc . If the ith element Xij of Xj is X k , then the concomitant of Xij 
is Yij = Ffc. Instead of trying to determine the value of Yy directly, each 
group Xj is tested to discover whether the abnormality is present in the 
group, that is, to determine the value of 

xr ± -r r { 1, if Yjj = 1 for some i in the range 1 <i < re,-, 

Y*= max 7 !i= ' 13 . & - - j, 

3 i<i<rij J [ 0, otherwise. 

Of course, Yj is obtained without observing the Y^s directly; for exam- 
ple, when the abnormality is detected by a blood test, the bloods of all 
individuals in a group are mixed together, and this mixed blood is tested 
for contamination. From the data pairs (Xj,Y?) we wish to estimate the 
probability function p(x) = P(Y{ = 1 | Xi = x) = E(Yi = 1 | = x). 

Since p is a regression curve, then if the sample (Xi, Yi), i = 1, . . . , N, were 
observed, we could use standard nonparametric regression techniques such 
as, for example, local polynomial estimators. Let (. > be an integer, h > 
a bandwidth, K a kernel function and Kh(x) = h^ 1 K(x/h). The standard 
^th degree local polynomial estimator of p is defined by 

(2.1) p 5 (x) = (l,0,...,0)Q" 1 R, 

where R = (R (x), . . .,Ri(x)) T , Q = (Qij)i<i,j<£+i, with Q {j = Q i+j ^ 2 (x), 

and where Q k (x) = Y^=i( X i ~ ^^h^ - x) and R k (x) = Eti Y i( X i ~ 
x) k Kh(Xi — x). See, for example, Fan and Gijbels (1996). Of course, when 
the data are pooled, the Yi's are not available, and we cannot calculate such 
estimators. Therefore we need to develop specific ways to estimate p from 
pooled data. 

2.2. Method for homogeneous pools. Depending on the study, it is not 
always possible to observe the AVs before pooling the data, so that the in- 
dividuals are pooled randomly. This is the context of the work of Delaigle 
and Meister (2011), who constructed a nonparametric estimator for the case 
where data Xi are assigned randomly to the groups Xj. See Appendix A.l of 
the supplemental article [Delaigle and Hall (2011)] for a summary of prop- 
erties of their estimator. In other studies, the AVs are observed beforehand; 
see, for example, the study of hepatitis C infection among 10,654 health care 
workers in Scotland, carried out by Thorburn et al. (2001). In such cases, 
it has already been demonstrated in the parametric context that it can 
be greatly advantageous to pool the data nonrandomly; see Vansteelandt, 
Goetghebeur and Verstraeten (2000). 
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Unfortunately, the only nonparametric estimator available for group test- 
ing data [see Delaigle and Meister (2011)] crucially relies on random grouping 
and is not valid when homogeneous groups are created. Below we suggest 
a new nonparametric approach which is valid with homogeneous pooling. We 
introduce our procedure in the case of a single covariate and equally sized 
groups. Generalizations of our method to unequal group sizes and multiple 
covariates will be treated in Section 5. These generalizations are similar in 
most respects. 

To create homogeneous pools we divide the data into groups of equal 
number, taking the jth group to be Xj = {Xffj-\) v +\)) ■ ■ ■ , Xu v \\, where 
v = rij, in this case not depending on j, is the number of data in each group, 
and Xm < • • • < -^(N) denotes an ordering of the data in X. We assume 
that v divides N; the case where it does not is a particular case of our 
generalization in Section 5. Note that, with Zj=l — Y*, 

V 

(2.2) E(Z;\X) = l[{l-p(X ij )}. 

i=l 

The right-hand side here is generally close to {1 — p(Xj)} v , where Xj = 
u^ 1 Xij denotes the average value of the Xy's in the jth group, and that 
closeness motivates the definition of p{x) at (2.4), below. Let 

(2.3) ^x) = {l-p{x)Y. 

Reflecting (2.2) and the above discussion, we suggest estimating p(x) by 

(2.4) p(x) = l-^(x) 1 '^ 

where /I is a nonparametric estimator of jjL. 

It remains to estimate \i. We begin by giving motivation for our methodol- 
ogy. Since, by construction, the groups are homogeneous, the observations in 
a given group are similar. In particular, p{X/^_ l \ v+l \), . . . ,p(X^^) are well 
approximated by p(Xj). Together, this and identity (2.2) suggest that p(Xj) 
can be approximated by E(Zj \ Xj), so that p,(x) is approximately equal to 
the average of the E{Z* \ Xj)'s over the Xj's close to x, which can be es- 
timated by standard nonparametric regression estimators calculated from 
the data (Xj,Zj), j = 1, . . . , J. Motivated by these considerations, we define 
an £th order local polynomial estimator of fi, constructed from the data 
(Xj,Z*), by 

(2.5) /I(x) = (l,0,...,0)S- 1 T, 

where T = (Tq(x), . . . ,T e (x)) T and S = (Sij)i<ij<£+1, with = S i+ j- 2 (x), 
Sk(x) = Zj(Xj ~ x) k K h {Xj - x), and T k (x) = Zj Z*(Xj - x) k K h (Xj - x). 

We shall show in Section 3 that this approach is well founded, by proving 
consistency of the resulting estimator p of p. We shall develop our theoretical 
results for a larger class of estimators which encompasses the estimator 
at (2.5). 
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3. Theoretical properties. To study properties of our estimator it is con- 
venient to express the probability p, at a particular X ^ clS 

(3.1) p(x) = 8(N)tt(x), 

where b = S(N) denotes a sequence of positive numbers that potentially 
depend on N, and tt is a fixed, nonnegative function. To be as general as 
possible, we permit the group size v = v(N) > 1 to increase, and b = b(N) 
to decrease, as N diverges. 

In large screening studies the abnormalities under investigation are invari- 
ably rare, that is, p is small. To understand the limitations of our estimator, 
we shall study properties in the extreme situation where b — > (and hence 
p —■ 0) as N— > oo. More precisely, we shall consider the "low prevalence" 
situation where vb — > as N — > oo, which is an asymptotic representation of 
the case where the group size v is relatively small and infection is rare. In 
practice, groups larger than 10 to 20 are rarely taken. One reason for this is 
that, depending on the proportion of positive individuals in the population, 
some tests (e.g., HIV tests) become too unreliable if the pool size is too large 
(larger than v = 5 to 10 in the HIV example). To reflect this fact, we shall 
also consider the standard "moderate pooling" situation where vb — > c > 
as N — > oo. However, there are tests for which groups could be taken as large 
as v = 40 to 50. From the viewpoint of economics, large groups would be 
beneficial, and might even be the only possible way to screen individuals in 
poor countries. Hence we need to understand their effects on the quality of 
estimators. We shall do this by investigating asymptotic properties of our 
estimator in the extreme "over-pooling" situation where vb — > oo as N — > oo. 

3.1. Conditions. We shall derive theoretical properties of the estimator p 
defined at (2.4), where for fl we shall generalize the local polynomial esti- 
mators introduced at (2.5), by considering a whole class of linear smoothers, 
defined by 

(3.2) fi(x) = ^Wj{x)Z* l^w^x), 

3 j 

where the weights Wj depend on X but not on the variables Z*. The local 
polynomial estimator defined at (2.5) can be rewritten easily in this form, 
and other popular nonparametric estimators (e.g., smoothing splines) can 
be expressed in this form too; see, for example, Ruppert, Wand and Carroll 
(2003). 

Recall that Xj = v~ x Yl^Xij and let h = h(N) denote a sequence of con- 
stants decreasing to zero as N — > oo. We can interpret h(N) as the band- 
width in a kernel-based construction of the weight functions Wj in (3.2). 
Typically, the weights wj would depend on Xj, and we assume that, for 
each x € X, where X is a given compact, nondegenerate interval: 
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Condition S. 

(51) T, 3 w J {x)(X J -x)/Y J3 w J {x) = Q; 

(52) Z j w j {x)(X j -x)*/^ j w j {x) = h?b(x) + o p (h 2 y, 

(53) Ej^yiEjM^^M^/im+oAv/m)}- 

(54) for each integer k > 1, J2j K'O^lVlEj Wj(x)} k = O p [{v / {Nh)} k ' 1 }, 
where the functions b and v are continuous on J and are related to the type 
of estimator. 



We also assume that: 



Condition T. 

(Tl) the distribution of X has a continuous density, /, that is bounded 
away from zero on an open interval J containing X; 

(T2) p = Sir is bounded away from 1 uniformly in x £ X and in N > 1; 

(T3) the function ir in (3.1) has two Holder-continuous derivatives on J\ 

(T4) for some e > 0, h + v5h + (i/ 2 /N 1_e M) -> as TV -> oo; 

(T5) the weights Wj(x) vanish for \Xj — x\ > C/i, where C > is a con- 
stant. 



The assumption in (Tl) that / is bounded away from zero on a compact 
interval allows us to avoid pathological issues that arise when too few values 
of X are available in neighbourhoods of zeros of /. Finally, when describing 
the size of p(x) —p(x) simultaneously in many values x we shall ask that for 
some C, e > 0, 

w k {x) w k (x') 



(3 - 3) , sup , ATx-hvzY. 

x,x'€l: \x-x'\<N- c U x x I 



O p (l). 



For example, if the weights u>j correspond to the local polynomial esti- 
mator in (2.5) with £ = 1 (i.e., the local linear estimator), with bandwidth h 
and a compactly supported, symmetric, Holder continuous, nonnegative ker- 
nel K satisfying f K = 1, and if h + (Nh)' 1 = 0(N~ £l ) for some e\ > 0, and 
(Tl) holds, then (T5), Condition S and (3.3) hold with, in (S2) and (S3), 
b = J u 2 K(u)du (not depending on x) and v(x) = fix) -1 J K 2 . Further- 
more, Condition S holds uniformly in i£l, More generally it is easy to 
see that when £ > 1, the ^th order local polynomial estimator in (2.5) sat- 
isfies ■ Wj(x)(Xj — x) k = for k = 0, . . . ,£ — 1, and hence conditions (SI) 
and (S2) are trivially satisfied. Conditions (S3) and (S4) too are satisfied 
in this case, under mild conditions on the kernel. Note that condition (SI) 
is not satisfied in the local constant case [£ = in (2.5)]. Although this in- 
stance can be easily accommodated by modifying our conditions slightly, we 
simply omit it from our theory because in practice the local linear estimator 
is almost invariably preferred to the local constant one. 
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Remark 1. Instead of linear smoothers, such as local polynomial es- 
timators, we could use alternative procedures which are sometimes pre- 
ferred in the context of binary dependent variables. For example, Fan, Heck- 
man and Wand (1995) suggest modeling the regression curve m by m{x) = 
g {r)(x)}, where g is a known link function, and r\ is an unknown curve. 
These methods have theoretical properties similar to those of local poly- 
nomial estimators; the two methods differ mostly through their bias, and, 
depending on the shapes of m and g, one method has a smaller bias than 
the other. We prefer local polynomial estimators because they are easier to 
implement in practice. 

3.2. Low prevalence and moderate pooling. Our first result establishes 
convergence rates and asymptotic normality for the estimator p defined 
at (2.4), with fx at (3.2). Note that we do not insist that v and 5 vary with N; 
the regularity conditions for Theorem 3.1 hold in many cases where v and 5 
are both fixed. Below we use the notation A[x) to denote the value taken by 
a function A at a point x, and the notation A when referring to the function 
itself. However, in some places, for example, in result (3.4) where it is nec- 
essary to refer explicitly to the point x mentioned in the statement "for all 
x € X," and in definitions (3.5) and (3.6), where we are defining functions, 
the two notations may appear a little ambiguous. 

Theorem 3.1. Assume that Conditions S and T hold, and that v8 = 
0(1). Then, for each x^X, 

(3.4) p{x) - p{x) = A(x)V(x) + B{x) + o p {5h 2 + {5/Nh) l/2 }, 

where the distribution ofV(x) converges to the standard normal law as N — > 
oo, and the functions A and B are given by 



(3.5) A = [{vNh)- x {\ -p) 2 ~ u {l - (1 -p) u }v] 1/2 = 0{(S/Nh) 1/2 }, 

(3.6) B = \h 2 {p" -{y- 1)(1 - p)~ V) 2 }& = 0(5h 2 ), 



where b and v are as in (S2) and (S3). If, in addition, Condition S holds 
uniformly in x £l, if (3.3) holds, and if the functions b and v are bounded 
and continuous, then 



Note that A and B represent, to first order, the standard deviation of 
the error about the mean, and the main effect of bias, which arise from the 
asymptotic distribution. For simplicity we shall call A 2 and B the asymptotic 
variance and bias of the estimator. From the theorem we see that, when 
B{x) (e.g., for the local polynomial estimator with £ = 1), if N5 — > co 



(3.7) 
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as N — >• oo, then the rate of the estimator is optimized when h is of size 
(iV^)" 1 / 5 , in which case the estimator satisfies 

(3.8) for each x G 1 p(x) - p{x) = O p {(5 3 /N 2 ) 1/5 }. 

Note that when v = 1 (no grouping), [i = 1 — p and our estimator of p reduces 
to a standard local linear smoother of 1 — fj,. For example, the estimator 
at (2.5) coincides with 1 — pg in (2.1). Taking v = 1 in the theorem, we 
deduce that the convergence rate of our estimator for v > 1, given at (3.8), 
coincides with the rate for conventional linear smoothers employed with 
nongrouped data. By standard arguments it is straightforward to show that 
this rate is optimal when tt has two derivatives, and hence our estimator 
is rate optimal. Although, in (T3), we assume that tt has two continuous 
derivatives, continuity is imposed only so that the dominant term in an 
expansion of bias can be identified relatively simply, and the convergence rate 
at (3.8) can be derived without the assumption of continuity. In addition, 
note that when vb = o(l) our estimator has the same asymptotic bias and 
variance expressions, B and A, as the estimator when u = l, which in that 
case reduce to A = (8/Nh) 1/2 (iTv) l/2 and B = \5h 2 it"b + o p (5h 2 ). In other 
words, in that case the statistical cost of pooling is virtually zero. 

The results discussed above also apply if performance is measured in terms 
of integrated squared error (ISE), as at (3.7). In particular, if h is of size 
(AT,5)- 1 /5 ) provided that vb is bounded, the estimator p achieves the minimax 
optimal convergence rate, 

(3.9) J^P-P) 2 = P {(5 3 /N 2 ) 2 / 5 }. 

Remark 2. Similar conclusions can be drawn in the case of estimators 
for which B(x) = 0, but this requires us to assume that the function tt 
has enough derivatives so that an explicit, asymptotic, dominating, nonzero 
bias term can be derived. For example, for our local polynomial estimator 
of order I > 1, we have B(x) = and the term op(5h 2 ) is only an upper 
bound to the bias of the estimator. A nonvanishing asymptotic expression 
for the bias can easily be obtained for I > 1 if we assume that tt has I + 1 
continuous derivatives. This can be done in a straightforward manner, but 
to keep presentation simple, and since in practice local linear estimators are 
almost invariably preferred to other local polynomial estimators, we omit 
such expansions. 

Remark 3. In the case where 5 — > 0, it could be argued that the rates 
are meaningless since we are trying to estimate a function that tends to zero, 
and that it is more appropriate to consider the nonzero part tt of p in the 
model at (3.1), and see how fast 7? = p/S converges to tt. The convergence 
rate of tt is easily deducible from (3.8): 

(3.10) For each x£l n(x) - tt(x) = O p {{N5)- 2/5 }. 
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Provided that Nb — >■ oo as N — > oo, n(x) is consistent for tt(x), and the 
convergence rate evinced by (3.10) is optimal. 

3.3. Over-pooling. The situation is quite different when vb — > oo as N — > 
oo, which can be interpreted as an asymptotic representation of the situation 
where the data are pooled in groups of relatively large size v. In practical 
terms the results in this section serve as a salutary warning not to skimp 
on the testing budget. The work in Section 3.2 shows that the performance 
of estimators is robust, up to a point, against increasing group size, but 
in the present section we demonstrate that, after the dividing line between 
moderate pooling and overpooling has been crossed, performance decreases 
sharply. 

When vb — > oo, properties of the estimator of p(x) depend on x, because 
there the order of magnitude of fi(x), at (2.3), depends critically on the rate 
at which {1 — p(x)} y converges to zero. The following condition captures 
this aspect: 

(3.11) for some e>0 v /h = o[N l ~ £ {I - bn{x)Y], 
and the following theorem replaces Theorem 3.1. 

Theorem 3.2. Assume that vb — > oo as N — > oo, Conditions S, T 
and (3.11) hold, and tt, b and v are all nonzero at x. Then p(x) — p(x) = 
A(x)V(x) + {1 + Op(l)}B(x), where V(x) is asymptotically distributed as 
a normal N(0, 1) as N — > oo ; and A and B are given by the first identities 
in each of (3.5) and (3.6). 

Note that the orders of magnitude given by the second identities in each 
of (3.5) and (3.6) are not valid in this case, and neither does result (3.7) 
necessarily hold under the conditions of Theorem 3.2. Note too that the 
theorem can be extended to cases where b = 0, along the lines discussed in 
Remark 2. To elucidate the implications of Theorem 3.2, assume that ir'(x) is 
nonzero, and define Aat(x) 5 = {1 — bTr(x)}~ u , which, when vb — > oo, diverges 
exponentially fast as a function of vb. Given a sequence of constants cn 
and a sequence of random variables Vn, write Vjv ~p cn to indicate that 
both Vn = O p (cn) and cn = O p (Vn) as TV — > oo. Theorem 3.2 implies that, 
if vb — >■ oo and h is a constant multiple of Xi\r(Nb 4 v s )~ 1 ^ 5 , then 

(3.12) {p(x) -p(x)} 2 x p {8*/N 2 fl\v5)- 2 l*\ N (x)\ 

and in particular diverges at a rate that is exponentially slower, as a function 
of vb, than in the case where vb = 0(1), treated in Section 3.2. Result (3.12) 
follows from the fact that A(x) 2 x (vNh)" 1 \ N (x) 5 and \B(x)\ X h 2 vb 2 , 
where a%(N) x a2(N) means that a\{N)/a2{N) is bounded away from zero 
and infinity. Note that (3.12) includes the case where p (and hence b) is held 
fixed, and v — > oo as iV — > oo. 
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The result at (3.12) shows that when v — > oo as N — > oo, p suffers from 
a clear degradation of rates compared to the case where v5 = 0(1). Next 
we show that this degradation is intrinsic to the problem, not to our es- 
timator p; any estimator based on the pooled data in Section 2.2 will ex- 
perience an exponentially rapid decline in performance as v5 — > oo. More 
precisely we show in Theorem 3.3 that, when vb — >■ oo as N — > oo, p is near 
rate-optimal among all such estimators. Recall that, under our model (3.1), 
p = 5tt, where b = b(N) potentially converges to zero. If vb — > oo, then, 
by (3.12), we have 

(3.13) \p(x)-p(x)\=O p [(5 3 /N 2 ) 1 / 5 (vS)- 1 / 5 {l-p(x)}- 2 ^ 5 ]. 

Although this result was derived under the assumption that tt is a fixed func- 
tion with two continuous derivatives, since (3.13) is only an upper bound, 
then it is readily established under the following more general assump- 
tion: 

the nonnegative function tt = ttn can depend on N and satisfies 

(3.14) ttn{x)+ \tt' n (x)\ + |-7r^(x)| < Ci, for all N and all x, where the 
constant C\ > does not depend on iV or 

Take the explanatory variables Xi to be uniformly distributed on the 
interval M = [— i, and let I C J C A4 where is an interior point of I. 
Let p 1 = Sttn, where satisfies (3.14), let p° = 5 denote the version of p 1 
when 7Tjv = 1, and consider the condition 

(3.15) (v 3 5) 1/2 = o{N(l - S) v }. 

This assumption permits vb to diverge with N, but not too quickly. Indeed, 
using arguments similar to those in Section 6.3, it can be shown that if (3.15) 
fails, then no estimator of p is consistent. Let V be the class of measurable 
functions p of the pooled data pairs {Xj,Y*) introduced in Section 2.2. 

Theorem 3.3. Assume that p° and p 1 are bounded below 1, that (3.15) 
holds and that v8 — > oo. Let x be an interior point of the support, [— ^, i], of 
the uniformly distributed explanatory variables Xj. Then C2 > 0, and ttn, 
satisfying (3.14), can be chosen such that 

liminf max inf P[\p(x) — p(x)\ 

(3.16) > C 2 5 (Nu5)- 2 / 5 {l -p{x)}- 2u ' b ] 
>0. 

Except for the fact that (v5)~ 2 ^ , rather than {u5)~ 1 ^ , appears in (3.16), 
the latter result represents a converse to (3.13). The difference in powers here 
is of minor importance since the main issue is the factor {1 — p(x)}~ 2u ^ 5 , 
which (in the context vb — > 00 of over-pooling), diverges faster than any 
power of v5, and this feature is represented in both (3.13) and (3.14). 
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3.4. Comparison with the approach of Delaigle and Meister. Arguments 
similar to those of Delaigle and Meister (2011) can be used to show that, 
under conditions similar to those used in our Theorem 3.1, their estimator p 
[see (A.l) in the supplemental article, Delaigle and Hall (2011)] satisfies 
p(x) -p(x) = Ai(x)Vi(x) + B 1 (x) + o p {5h 2 + (uS/Nh) 1 / 2 }, where the ran- 
dom variable V\{x) has an asymptotic standard normal distribution and 

(3.17) A x = [(Nh)~ 1 {l-p)q 1 ~ u {l - (l-p)q v - l }v\ l ! 2 = 0(v5/Nh), 

(3.18) Bx = \h 2 p"b = 0{5h 2 ) 

with q = E{l—p(X)}. Likewise, the analog of (3.7) can be derived in the fol- 
lowing way: f x (p-p) 2 = j x {A\ + B 2 ) + o p {5 2 h A + (vS/Nh)}. To simplify the 
comparison, assume that we use estimators for which b and v do not vanish, 
and that ir > 0. We see when comparing (3.17)-(3.18) with (3.5)-(3.6) that 
the asymptotic variance term A 2 of our estimator is an order of magnitude v 
times smaller than A 2 . Note too the asymptotic bias terms of p and p are of 
the same size (the two biases are asymptotically equivalent if i/i^O, and 
have the same magnitude in other cases). Hence, with our procedure, the 
gain in accuracy can be quite substantial, especially if v is large. 

4. Numerical study. We applied the local linear version of our local poly- 
nomial estimation procedure [i.e., the one based on (2.5) with t = 1] on sim- 
ulated and real examples. This method, which we denote below by DH, is 
the one we prefer because it works well, it is very easy to implement and 
we can easily derive and compute a good data-driven bandwidth for it. The 
practical advantages of local linear estimators over other local polynomial 
estimators have been discussed at length in the standard nonparametric 
regression literature. Of course, other versions of our general local linear 
smoother procedure can be used, such as a spline approach or more compli- 
cated iterative kernel procedures (see Remark 1). Each of the methods gives 
essentially the same estimator. 

In our simulations we compared the DH procedure, calculated by defini- 
tion from homogeneous groups, with the local linear estimator p$ at (2.1) 
that we would use if we had access to the original nongrouped data. We also 
compared DH with the local linear version of the method of Delaigle and 
Meister (2011), which, by definition, is calculated from randomly created 
groups. We denote these two methods by LL and DM, respectively. We took 
the kernel, K, equal to the standard normal density. For h, in the DM case 
we used the plug- in bandwidth of Delaigle and Meister (2011) with their 
weight loq; we used a similar plug- in bandwidth in the LL and DH cases; 
see Section A. 2 of the supplemental article [Delaigle and Hall (2011)] for 
details. 
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4.1. Simulation results. To facilitate the comparison with the DM method, 
we simulated data according to the four models used by Delaigle and Meister 
(2011): 

(i) p(x) = {sin(vrx/2) + 1.2}/[20 + 40x 2 {sign(x) + 1}] and X ~ U[-3, 3] 
or X~iV(0,1.5 2 ); 

(ii) p(x) =exp(-4 + 2x)/{8 + 8exp(-4 + 2x)} and X ~ U[-l,4) orI~ 
iV(2,1.5 2 ); 

(hi) p(x) = x 2 /8 and X ~ C/[0, l]orI~ 7V(0.5, 0.5 2 ); 
(iv) p(z) = x 2 /8 and X ~ E/j-1, 1] orI~ N(0, 0.75 2 ). 

We generated 200 samples from each model, with X normal or uniform, 
and with N = 1000, N = 5000 and N = 10,000. Then for the DH method 
we split each sample homogeneously into groups of equal sizes v = 5, v = 10 
or v = 20; for the DM method, we created the groups randomly (remember 
that this estimator is valid only for random groups). 

To assess the performance of our DH estimator we calculated, in each 
case and for each of the 200 generated samples, the integrated squared error 
ISE = J b (p — p) 2 , with a and b denoting the 0.05 and 0.95 quantiles of the 
distribution of X. We did the same for the DM and LL estimators p and ps- 
For brevity, figures illustrating the results are provided in Section A. 4 of 
the supplemental article [Delaigle and Hall (2011)], and here we show only 
summary statistics. In the graphs of Section A. 4, we show the target curve 
(thin uninterrupted curve) as well as three interrupted curves; these were 
calculated from the samples that gave the first, second and third quartiles 
of the 200 ISE values. 

In Table 1 we show, for each model with X uniform, the median (MED) 
and interquartile range (IQR) of the 200 ISE values obtained using the LL 
estimator based on nongrouped data, and, for several values of u, the DH and 
the DM approaches based on data pooled in groups of size v\ Table 2 shows 
the same but for X normal. Note that LL cannot be calculated from grouped 
data, but we include it to assess the potential loss incurred by pooling the 
data. The tables show that for v < 10, pooling the data homogeneously 
hardly affects the quality of the estimator. Sometimes, the results are even 
slightly better with the DH method than with the LL one. Indeed a careful 
analysis of the bias and variance of the various estimators shows that for 
some curves p(x), grouping homogeneously can sometimes be slightly bene- 
ficial when v is small. (Roughly this is because by grouping a little we lose 
very little information, but we increase the number of Y* positive, which 
makes the estimation a little easier for this particular estimator. Theoretical 
arguments support this conclusion.) The situation is much less favorable for 
the DM random grouping method, whose quality degrades quickly as v in- 
creases. Unsurprisingly, DH beat DM systematically, except when N/v was 
small (N = 1000 and v = 20), where the J = 50 grouped observations did 
not suffice to estimate very well the curves from models (i) and (ii). 



Table 1 

Simulation results for models (i) to (iv), when the Xij 's are uniform. The numbers show 10 4 x MED (IQR) of the ISE calculated from 

200 simulated samples 







v = 1 


v = 


5 


v = 


10 


v = 


20 


Model 


N 


LL 


DH 


DM 


DH 


DM 


DH 


DM 


(i) 


10 3 
5-10 3 
10 4 


9.35 (7.42) 
2.91 (2.01) 
1.62 (1.20) 


10.1 (8.16) 
2.94 (2.38) 
1.83 (1.40) 


26.9 (24.1) 
7.59 (5.34) 
4.54 (3.05) 


11.0 (8.53) 
3.30 (2.06) 
2.07 (1.63) 


51.2 (49.6) 
14.1 (11.4) 
7.70 (6.13) 


17.8 (484) 
4.46 (2.94) 
2.89 (1.95) 


122 (110) 
29.2 (25.2) 
16.8 (13.9) 


(ii) 


10 3 
5-10 3 
10 4 


6.37 (8.38) 
1.48 (1.37) 
0.963 (0.843) 


8.66 (9.99) 
1.66 (2.26) 
1.02 (1.16) 


29.4 (28.4) 
6.37 (5.93) 
3.39 (2.89) 


10.3 (11.4) 
2.41 (2.74) 
1.35 (1.25) 


64.7 (69.5) 

13.8 (12.1) 
7.04 (6.20) 


29.7 (1560) 
4.47 (5.94) 
2.35 (3.26) 


166 (169) 
35.8 (30.0) 
19.1 (17.2) 


(iii) 


10 3 
5-10 3 
10 4 


0.777 (0.978) 
0.176 (0.220) 
0.093 (0.108) 


0.860 (1.26) 
0.166 (0.254) 
0.100 (0.128) 


3.44 (4.03) 
0.722 (0.818) 
0.355 (0.344) 


1.02 (1.31) 
0.214 (0.298) 
0.117 (0.158) 


7.26 (8.37) 
1.68 (1.67) 
0.797 (0.800) 


1.90 (4.81) 
0.356 (0.482) 
0.200 (0.212) 


19.9 (19.5) 
4.48 (3.97) 
2.28 (1.79) 


(iv) 


10 3 
5-10 3 
10 4 


2.33 (2.11) 
0.590 (0.510) 
0.309 (0.254) 


2.49 (2.32) 
0.633 (0.602) 
0.317 (0.293) 


7.41 (9.81) 
2.01 (1.73) 
1.10 (0.873) 


2.70 (2.55) 
0.637 (0.702) 
0.373 (0.311) 


17.2 (16.3) 
4.05 (3.70) 
2.31 (1.89) 


5.07 (166) 
0.964 (1.06) 
0.570 (0.539) 


39.7 (34.1) 
9.62 (9.11) 
5.47 (4.80) 
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Table 2 

Simulation results for models (i) to (iv), when the Xij 's are normal. The numbers show 10 4 x MED (IQR) of the ISE calculated from 

200 simulated samples 



Model 


N 


v = 1 


v = 


5 




10 


v = 


20 


LL 


DH 


DM 


im 


DM 


DH 


DM 


(i) 


10 3 


10.3 (6.69) 


10.7 (7.18) 


20.8 (19.0) 


10.8 (8.04) 


37.0 (35.3) 


12.8 (9.70) 


85.6 (72.8) 




5 ■ 10 3 


4.35 (2.80) 


4.14 (2.71) 


9.60 (5.49) 


4.32 (2.95) 


12.0 (11.1) 


4.50 (3.44) 


17.3 (18.8) 




10 4 


3.12 (1.77) 


3.33 (2.07) 


7.66 (4.12) 


3.01 (2.01) 


9.42 (5.68) 


3.20 (2.19) 


13.6 (11.0) 


(ii) 


10 3 


5.02 (5.20) 


5.78 (6.83) 


17.0 (23.0) 


8.18 (10.6) 


46.0 (57.8) 


21.1 (64.0) 


167 (202) 




5-10 3 


1.69 (1.95) 


1.98 (2.18) 


4.23 (5.97) 


2.36 (3.40) 


9.48 (12.3) 


5.37 (6.75) 


28.3 (36.9) 




10 4 


1.02 (0.925) 


1.17 (1.21) 


2.99 (3.12) 


1.46 (1.64) 


5.51 (6.81) 


3.04 (3.22) 


15.0 (17.7) 


(iii) 


10 3 


0.897 (1.53) 


0.885 (1.06) 


2.95 (3.36) 


0.910 (1.27) 


5.73 (7.10) 


1.37 (2.14) 


23.7 (27.3) 




5-10 3 


0.274 (0.389) 


0.263 (0.325) 


0.946 (0.997) 


0.260 (0.383) 


1.61 (2.08) 


0.448 (0.692) 


4.26 (4.93) 




10 4 


0.204 (0.270) 


0.148 (0.175) 


0.637 (0.725) 


0.182 (0.219) 


1.13 (1.10) 


0.323 (0.435) 


2.42 (2.58) 


(iv) 


10 3 


4.13 (4.30) 


3.60 (3.48) 


13.2 (12.5) 


4.32 (3.84) 


28.1 (26.9) 


7.60 (9.43) 


82.3 (75.2) 




5-10 3 


1.30 (1.33) 


1.10 (1.01) 


3.85 (3.77) 


1.21 (1.22) 


7.45 (6.56) 


2.24 (2.20) 


16.6 (18.1) 




10 4 


0.764 (0.651) 


0.566 (0.474) 


2.50 (1.86) 


0.676 (0.672) 


4.63 (4.03) 


1.01 (1.04) 


10.1 (9.96) 




Fig. 1. NHANES study: DH estimator for v = 2, 5, 10 and 20 and LL estimator (thick 
curve) when Y — Yhbc (left) or Y — Ycl (right). 



4.2. Real data application. We also applied our DH method on real data. 
To make the comparison with the LL estimator possible, we used data for 
which we had access to the entire, nongrouped set of observations (Xi,Yi). 
Then we grouped the data and compared the DH and LL procedures. We 
used data from the NHANES study, which are available at www.cdc.gov/ 
nchs/nhanes/nhanesl999-2000/nhanes99_00.htm. These data were collected 
in the US between 1999 and 2000. 

As in Delaigle and Meister (2011), our goal was to estimate two condi- 
tional probabilities: phbc(x) = E(Yhbc I X = x ) and pcl(%) = E(Yql \ X = x), 
where X was the age of a patient, Yhbc = or 1 indicating the absence or 
presence of antibody to hepatitis B virus core antigen in the patient's serum 
or plasma and Ycl = or 1 indicating the absence or presence of genital 
Chlamydia trachomatis infection in the urine of the patient. The sample 
size was N = 7016 for HBc and ./V = 2042 for CL. The percentage of Yi's 
equal to one was 0.047 in the HBc case and 0.044 in the CL case. See De- 
laigle and Meister (2011) for more details on these data and the methods 
employed to collect them. 

For brevity here we only present the results obtained using our method 
by pooling the data homogeneously in groups of equal size v = 2, 5, 10 and 
20. As in the simulations, our DH estimator improved considerably on the 
DM method. An illustration of our procedure with a second covariate is 
given in Section A. 3 of the supplemental article [Delaigle and Hall (2011)]. 
In Figure 1 we compare DH with LL. All curves were calculated using our 
bandwidth procedure described in Section A. 2 of the supplemental article 
[Delaigle and Hall (2011)]. We see that, in these examples, grouping data 
in pools of size as large as v = 20 does not dramatically degrade perfor- 
mance. 
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5. Generalizations to unequal groups and the multivariate case. Our 

procedure for estimating p can be extended to the multivariate setting, where 
the covariates are random d-vectors, and to unequal group sizes. These ex- 
tensions can be performed in many different ways, for example, by binning 
on each variable, using bins of potentially different sizes to accommodate 
different levels of homogeneity. If we group using bins of equal dimension, 
then, to a large extent, the theoretical properties discussed earlier, in the 
setting of equal-size groups, continue to hold. To briefly indicate this we 
give, below, details of methodology and results in the case of multivariate 
histogram binning where, for definiteness, the bin sizes and shapes, but not 
the group sizes, are equal. Cases where the bin sizes and shapes also vary 
can be treated in a similar manner, provided the variation is not too great, 
but since there are so many possibilities we do not treat those cases here. 
An approach of this type is discussed in Section A. 3 of the supplemental 
article [Delaigle and Hall (2011)]. 

In the analysis below we take X to be a d-vector, and the function p 
to be (i-variate, where d > 1. We group the data in bins of equal width, 
specifically width (v/N) 1 l d along each of the d coordinate axes, rather than 
in groups of equal number. In the theory described below, for notational 
simplicity, we assume that the support of the distribution of X contains the 
cubeX= [0,l] d , and we estimate p there. We choose v so that J = {N/u) ' 
is an integer (on this occasion v is not necessarily an integer itself), and take 
the bins to be the cubes B{k\, . . . , kd) defined by 

B(k x , . . .,k d ) = f[ (l(2k e + l){v/N?' d - \{u/N) l / d , 
l=i ^ 

l -(2h + l){v/Nfl d + l -{v/Nfl d 

where kj> = 0, . . . , J— 1 for £ = 1, . . . ,d. In this setting it is convenient to write 
the paired data as simply (Xi, Y\), . . . , (X.n,Yn), where Xj is a d- vector and 
each Yj = or 1, and refer to X-,- in terms of the bin in which it lies, rather 
than give it a double subscript (as in the notation Xjj, where j is the bin 
index). 

Put b(h, ...,k d ) = {\{2k x + l){y/Nfl d , \(2k d + \){u/N) l l d ), repre- 
senting the center of the bin B(k±, . . . , kd), define 

Z*(ki, . . . , kd) = 1 — max Y* 
j:X j eB(k 1 ,...,k d ) J 

and compute /2 by applying a (i-variate local polynomial smoother to the 
values of (b(k±, . . . , kd), Z*{k\, . . . , kd)), interpreted as (explanatory variable, 
response variable) pairs in a conventional (i-variate nonparametric regression 
problem. To derive an estimator of p from /2 we take 

(5.1) p(x) = l-/I(x) 1 / m W, 

where m(x) denotes the number of data Xj in the bin containing x € X. 
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In developing theoretical properties of this estimator we choose our reg- 
ularity conditions to simplify exposition. In particular, we replace assump- 
tions (S1)-(S4) and (T5) by the following restriction: 

Condition U. The nonpar ametric smoother defined by the estimator 
at (3.2) is a standard <i-variate local linear smoother [see, e.g., Fan (1993)], 
where the kernel K, a function of d variables, is a spherically symmetric, 
compactly supported, Holder continuous probability density, and, for some 
e > 0, the bandwidth h satisfies h + (iV/i^)" 1 = 0(N~ £ ) as N oo. 

Conditions (T1)-(T4) are replaced by (V1)-(V4) below, and (V5) is ad- 
ditional: 

Condition V. 

(VI) the distribution of X has a continuous density, /, that is bounded 
away from zero on an open set J that contains the cube X = [0, l] d ; 

(V2) the function p = 5tt is bounded below 1 uniformly on X and in N > 1; 

(V3) the fixed, nonnegative function tt has two Holder-continuous deriva- 
tives on J\ 

(V4) for some e > 0, h + uSh + (u 2 /N 1 - £ h d 5) -> as N -> oo; 
(V5) d(SN) 4 < u d+4 < C 2 N d+3 /5 for constants C 1 ,C 2 > 0. 

Theorem 5.1. Assume that Conditions U and V hold, and that v5 = 
O(l). Then, for each x € X, 

(5.2) p(x) = p(x) + O p {{5/Nh d ) l l 2 + 5h 2 }. 

The "O p " term on the right-hand side of (5.2) has exactly the same size 
as the dominant remainder term, A(x)V(x) + -B(x), on the right-hand side 
of (3.4) in Theorem 3.1, provided of course that we take d = 1 in Theo- 
rem 5.1. Refinements given in Theorem 3.1 and in the results in Section 3.3 
can also be derived in the present setting. 

Theorem 5.1 is proved similarly to Theorem 3.1, and so is not derived in 
detail here. The main difference in the argument comes from incorporating 
a slightly different definition of p, given by (5.1). For example, suppose p 
is as defined at (5.1), and note that E(m) = v\ + 0{vi(vi/N) 2 }, where 
v\ (x) = ^/(x) and / denotes the density of X. Since, in addition, m — 
E(m) = Oplu 1 / 2 ), then m = v 1 (l + A)" 1 where |A| = O v {v~ x l 2 + {v/N) 2 }, 
and, much as in the argument leading to (6.7), 
p=l_/lV™ = 1 _ (/2 iM ) i+A 

= 1 - [1 -p + OJ5h 2 + (d/Nh d )^ 2 }] 1+A 

(5.3) 

= 1 - (1 -p)[l + O p {5h 2 + (5/Nh d ) 1 / 2 + «5|A|}] 
= p + O p [5h 2 + {5/Nh d ) l l 2 + 5{v~ 1 ' 2 + {v/N) 2 }}. 
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Now, bh 2 + (S/Nh d ) 1/2 is minimized by taking h = (iV<5)~ 1/(rf+4) , and for 
this choice of h we have 

S~ 1 {Sh 2 + (S/Nh d )^ 2 } x (5N)~ 2 ^ d+4 \ 

This quantity is not of smaller order than i/ _1//2 + (y/N) 2 if and only if both 
1/-V2 = 0{(5N)~P} and (v/N) 2 = 0{(5N)~p}, where p = 2/(d + 4). This is 
in turn equivalent to 

d(,5iV) 4/(d+4) < 1/ < C 2 (iV d+3 /5) 1/(d+4) 

for constants C\,C<i > 0, which is also equivalent to (V5). Therefore if (V5) 
holds, then we can deduce (5.2) from (5.3). 

6. Technical arguments. 

6.1. Proof of Theorem 3.1. Let Dj equal the maximum of \Xij — Xj\ over 
i = 1, . . . , v. The ratio v/N equals the order of magnitude of the expected 
value of the width of the group that contains x € X, and it can be proved 
that 

, , for each e > 0, Dj = O p {v /N l ~ £ ) uniformly in j such that 
^ ' |Xj-x|<C7iandxeZ. 

Note that, by (T4), v/N 1 ' 6 -> for sufficiently small e > 0. 

For k = l,2 let p*-' -* be the Arth derivative of p, and put p^ = p^ / {k\(l — 
p)}. Let r] > denote the exponent of Holder continuity of p" on X; see (T3); 
that is, \p"(xi) —p"{x2)\ = 0{\x\ — X2\ r ') uniformly in x\,X2 €X. Then, us- 
ing (6.1) it can be proved that for each e > 0, 



E(Z*\X)=H{l- P (X ij )} 
i=i 

u 

= {1 -p{Xj)Y H{l- Pl (X 3 )(X tj - Xj) + O p (5D 2 )} 
i=i 

= {1 - p(X,)r JJexpl-p!^)^ - X,) + O p (5D 2 )} 



(6.2) 



i=i 



{l-p^OFexp^ -^2 Pl (X j )(X lj - X,) + O p (u5D 2 ) 

i=l 



= {l-p(X j )} v exp{O p (v8D])} 

= {l-p(X J )r{l + O p ^5/N 2 -)}, 

uniformly in the sense of (6.1) and for each e > 0. [Assumption (T4) implies 
that v 3 S/N 2 ~ e — > for some e > 0.] Observe too that, uniformly in the same 
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(6.3) 



sense, 

{i-p(^or 

= {\-p{x)Y{\-p 1 {x){X j -x) 

- p 2 {x){X j - xf + O p (6h 2+r >)r 

= {l-p{x)Y[l-up l {x){X j -x) 

+ {\v(y - l)pi(x) 2 - vp 2 {x)}{Xj - xf 

+ O p (v5h 2+r i + v 3 5 3 h 3 )}, 

again uniformly in the sense of (6.1). [Note that, by (T4), vbh — » 0.] Com- 
bining (3.2), (T4), (SI), (S2), (S4), (6.2) and (6.3) we deduce that, for each 
e > and each x € X, 

£(*) = E{Jl(x) \X} = J2wj (x)E(Z* | X) I Wj (x) 



{i-p{x)y i + 



-v{v - l)pi(xf - vp 2 (x) 



T,j w j( x )( X i- x ) 
J2jWj(x) 



+ O p {v5h 2+r i + v 3 5 3 h 3 + v 3 5N £ ~ 2 )^ 



{i- P (x)r 



l + h 2 \ ^u(v - l)pi{x) 2 - up 2 {x) }b(x) 



+ o p {v5h 2 + v 3 5N £ ~ 2 ) 

whence, for all e > 0, 

iUxf/v = {1 -p(x)}[l - h 2 {p 2 {x) - \(y - l) Pl (x) 2 }b(x) 

+ o p (5h 2 + v 2 5N £ - 2 )}, 
uniformly in x € T. Hence, defining 

(6.5) A(x) = £(s) - £(x) = w i( x )i Z j ~ E(Z* \X)}/Y1 w i^> 



(6.4) 



noting that 1 — p is bounded away from zero [see (T2)], and taking the 
argument of the functions below to equal the specific point x referred to 
in (3.4), we deduce that 

p = \-tf lv = l-(fi + A) 1 / u 
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= 1 - (1 - p)[l - h 2 {p 2 - \(y - l)p\}b + o p (5h 2 + v 2 SN £ ~ 2 )] 

= p + (1 - p)[h 2 {p 2 -\{y- l)p 2 }b + o p (5h 2 + u 2 5N £ - 2 )] 

(6.6) - {1 + o p (l)} I /- 1 (l -p)-^- 1 )A + Op^-^l -p)-^" 1 ^ 2 } 
= p + (1 -p)[fc 2 {p 2 - - l)p 2 }b + o p (bh 2 )\ 

(6.7) -{l + OpWKHl-pr^A, 

where (6.6) holds without the assumption vb = 0(1) [it holds under either 
that condition or (3.11)], but (6.7) requires vb = 0(1). Note that, by (T4), 
v/N^h -> for some e > 0, and so z; 2 (W 2£ - 2 /(<5/i 2 ) = (v/N l ~ £ h) 2 -> 0. 
Additionally, it will follow from (6.9) below that, when b = 0(1), A = 
O p {{v 2 b/Nh) 1 / 2 }, and by (T4), b/Nh -> 0, so A = 0p (l). The identity lead- 
ing from (6.6) to (6.7) follows from this property. 

Observe that, by (6.2) and (6.3), E(Z* \ X) = {1 + o p {1)}{1 - p(x)} v and 

1 - E(Z* \X) = 1- {1 -p{x)} v + O p [{l - p(x)Y{ubh + v 3 bN £ ~ 2 )}, 

uniformly in j such that \Xj — x\ < Ch, where C is as in (T5), and moreover, 

var A X) = f = -^-^ J -= ^ J - . 

[Here and in (6.8)-(6.10) the argument of the functions is the point x 
in (3.4).] Therefore, by (S3), 

var(A | X) = {1 + o p (l)}(v/Nh)(l - p)"{l - (1 -p) v }v 



(6. 



+ O p [{v/Nh){v5h + v 6 bN £ ~ 2 )} 



Properties (T4) and (6.8), and Lyapounov's central limit theorem (see the 
next paragraph for details), imply that when vb = O(l) and ir(x) > [the 
latter is assumed here and below; the proof when ir(x) = is simpler], we 
can write 

A = ((v/Nh)(l - p) v {l - (1 - p) u }v 
(6.9) + O p [{y/Nh){y8h + v 3 bN £ ~ 2 )]) 1/2 V 4 



= {1 + o p (l)}[(v/Nh)(l - pY{l - (1 - pYM 1 '^, 

where the second identity follows from the fact that h + v 2 N £ ~ 2 — > for some 
e > [see (T4)], and V4 denotes a random variable that is asymptotically 
distributed as normal N(0, 1). This result and (6.7) imply that 

p = p+{l-p)[h 2 {p 2 -\{v-l)p 2 }b + o p {bh 2 )} 

(6.10) 

- {1 + o p {l)}[{vNhY\l -P?~ U {1 - (1 - pY}v] 1/2 V 4 . 
Result (3.4) follows from (6.10). 
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When applying a generalized from of Lyapounov's theorem to establish 
a central limit theorem for A, conditional on X , we should, in view of (S4), 
prove that for some integer k > 2, [(u/Nh)(l — p) u {l — (1 — p) v }v]~ k / 2 (v/ 
Nhf- 1 -> 0. When v5 = O(l) this is equivalent to {5/Nh)~ k / 2 {v /Nh) k ~ l ->■ 
0, and hence to (Nh/u) 2 (u 2 /Nh5) k ^ 0; call this result (R). Now, (T4) 
ensures that for some e > 0, v 2 /N l ~ £ h5 — > 0. Therefore (R) holds for all 
sufficiently large k. 

Next we outline the derivation of (3.7). It can be proved from (3.3) that 
if C\ > is given, if C2 = C^Ci) > is chosen sufficiently large, if In is 
a regular grid of nF 2 points in X and if, for each we define xjy to be 

the point in In nearest to x, then 



(6.11) 



p{sup|A(x) - A(x N )\ < N- Cl \ -»• 1. 



Note that, by (T4), applying (S3), (S4), Rosenthal's and Markov's inequali- 
ties, we can prove that, for each C, e > 0, sup a;g2: P{|A(x)| > N £ (v 2 8 /Nh) 1 / 2 \ 
X} = O p (N~ c ). It follows that, for all C,s > 0, 

(6.12) p{ sup |A(x)| > N E (v 2 8/Nh) 1/2 \ x\ = 0(N" C ). 

Together (6.11) and (6.12) imply that, for each C, e > 0, 

(6.13) p{sup|A(x)| > N £ (v 2 8/Nh) x/2 \ 0. 

Results (6.4) (which holds uniformly in x € I) and (6.13) imply that (6.7) 
holds uniformly in x G X. Hence, 



(6.14) 



j^+j^-Hi-pr^AY 

+ 0p |(^ 2 ) 2 + ^(A/^) 2 }. 



Conditional on X the random variable A, at (6.5), equals a sum of indepen- 
dent random variables with zero means, and using that property, Condition S 
(which, for this part of the theorem, holds uniformly in x € I) and (3.4), it 
can be proved that 



(6.15) E 

(6.16) var 

(6.17) var 



p)-("-i)A} 2 

;i-p)-("-i) A } 



1 



X 



X 



X 



J A 2 + o p (5/Nh), 
o p {(5/Nh) 2 }, 
o p {(5/Nh) 2 + (5h 2 ) 4 }. 
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Result (6.15) follows from (6.8). To derive (6.16), note that by (6.4) we have, 
uniformly in x±,x 2 Gl, 



(6.18) 

where 
ti(xi,x 2 ) 

t 2 (xi,x 2 ) 



E{A(x 1 ) 2 A{x 2 ) 2 I X} = £{A(xi) 2 | X}E{A(x 2 ) 2 \ X} 

+ O p {ti(xi,x 2 )}, 



Y,^xi?Vi{x2?E[{Zl-E{m I x)Y I x\ 



Op{t 2 (x!,X 2 )}, 



ZjWjixrfwjjxtfvarjZ* \ X) 

{E^ORE;™^)} 2 



= O p 
v5 V 

ml) 



again uniformly in xi, x 2 G I. [The last and second-last identities here follow 
from (T4) and (S4), resp.] Noting these bounds, defining £i = {z/ _1 (l — 
p)-^" 1 )} 2 an d integrating (6.18) over xi,x 2 Gl, we deduce that 



which implies (6.16). 



X 



Ci(x)E{A(x) 2 | X}dx 



+ o p {(v5/Nh) 2 }, 



To derive (6.17), define £ 2 = #6 and e, = E[{Z* - E(Z* \ X)} 2 \ X], 



write M for the left-hand side of (6.17), and note that 



M 



6(^1)6(^2) 



XJX 



yi, "•/(•'•i)''- / (-'-2)'. ; 



In view of (T5), Wj(x) = if \Xj — x\ > Ch, and so the series in the nu- 
merator inside the integrand can be confined to indices j for which both 
\Xj — x\\< Ch and \Xj — x 2 \ < Ch. Therefore the integrand equals zero un- 
less \x\ — x 2 \ < 2Ch. Hence, defining J(x±,x 2 ) = 1 if \x± — x 2 \ < 2Ch, and 
J(x±,x 2 ) = otherwise, using the Cauchy-Schwarz inequality to derive both 
the inequalities below and writing ||X|| for the length of the interval I, we 
have 



M< 



XJX 



J(xi, x 2 )£ 2 (xi)£ 2 (x 2 ) 



(6.19) 



W^Wjixkfej/ <j J2wj(x k ) 
k=i j 



1/2 



dx± dx 2 
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IX JX 



J(xi,x 2 )£2(zi)60e2) 



2 1 1/2 

f[ var{A(x fc ) | X} 
U=i 




<||X||( J J J{xi,x 2 ) 




dx\ dx2 



' 2 

n>(*fc) 2 var{A(x fc )|*} 
U=l 

Using (6.8) show that var{A(xfc) | X} = 0(u 2 5 /Nh), uniformly in x^ £l, 
noting that B = 0(5h 2 ) uniformly in x El [the bound at (3.7) holds uni- 
formly in the argument of B] and observing that £i(x) = 0(v~ 2 ) uniformly 
in x G X, whence it follows from the bound for B that £2(3?) = 0(5h 2 v~ 2 ) 
uniformly in x Gl, we deduce from (6.19) that 

1/2 

M = Op[(i/ 2 <VA^){(<^" 2 )} 2 ]( / / J(x 1 ,x 2 )dx 1 dx 2 

(6.20) 1 ^ 

= O p (5 2 h 7 / 2 /N) = o p {{5/Nh) 2 + (<5/i 2 ) 4 }. 

Result (6.17) follows directly from (6.20). 

6.2. Proof of Theorem 3.2. The proof is similar to that of the first part 
of Theorem 3.1, the main difference occurring at the point at which the 
remainder term, O p {R) where R = v~ l (l — p)^^^ 1 ^ A 2 , in (6.6), is shown 
to be negligible relative to the term z^~ 1 (l — p) _ ^ _1 'A there. It suffices to 
prove that (1 — p)~ u A — > in probability, or equivalently, in view of (6.9), 
that {v/Nh){l —p)~ v —> 0. However, the latter result is ensured by (3.11). 

6.3. Proof of Theorem 3.3. Without loss of generality, the point x in (3.16) 
is x = 0. Recall that p° = 5, and take p l {u) = 5{1 + h 2 ip(u/h)}, where ip is 
bounded and has two bounded derivatives on the real line, is supported on 
[— 2> \ ] an d satisfies ^(0) 0- The respective functions 7r° = 1 and 7r 1 (n) = 
1 + h ip(u/h) satisfy (3.14). [The quantity h = h(N) > here is not a band- 
width, but converges to as N — > 00.] Therefore, p°(u) =p 1 (u) except when 
u € (—^h,^h). We assume that u5 — > 00 as N — > 00, and consider the prob- 
lem of discriminating between p° and p 1 using the data pairs (Xj,Y*). 

Without loss of generality, we confine attention to those pairs (Xj,Y*) for 
which Xj is wholly contained in [—^h,^h]. Pairs for which Xj has no inter- 
section with [—nh, hh] convey no information for discriminating between p° 
and p , and it is readily proved that including pairs for which Xj overlaps 
the boundary does not affect the results we derive below. In a slight abuse of 
notation we shall take the integers j for which Xj C [—\h, ^h] to be 1, . . . , m, 
where m = hN/v + op(l) and is assumed to be an integer. 

The likelihood of the data pairs (Xj,Y*) for 1 < j < m, conditional on 

X = {X U ..., X N }, is UT=i P 7 C 1 " P jf~ Y; where p j = p ( y / = 1 I X ) = 
1 — nr=iO — P(Xij)}- Let Pj and Pj denote the versions of Pj when p = p° 
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and p = p 1 , respectively. Also, let 6+ = P}/Pf and Qj = (1 - P/)/(l - Pf). 
In this notation the log-likelihood ratio statistic is given by 

m 

L = lo g( /) + ( x " Y D l °s(Qj)} 

(6.21) 

m m 

= £(i - F/)io g (eT/G+) +J> g (e+) 

j=i i=i 
and therefore, P(L | #) = £™ =1 (1- P,)log(97 /et) + £™ ^(9+), var(L | 
<*) = E^i-Pjl 1 - Pj){ l og(Qj/Q~-)} 2 - Writing £° and var° to denote ex- 
pectation and variance when p = p°, we deduce that 

m m 

(6.22) e°(l | x) = (i - *r J>g(e7/et) + ^iog(et), 

3=1 3=1 
m 

(6.23) var°(L | AT) = (1 - 6)»{1 - (1 - ^{log(eT/G+)} 2 . 

3=1 

Assume for the time being that 

(6.24) u5h 2 -»• 

as A — > oo, and observe that, since 1 — P? = (1 — S) v ', then 



97 = (i - - 5 {i + fcV(Jf« A)}] 



(6.25) 



i=l 



n{i - -^vpQiA)} = i - p/i 2 ^- + 
i=i ^ ^ 



where p = 8/(1 — 5), Sj = ^^(Xij/h) and Pj = Op(vp 2 h 4 ) uniformly in 
1 < j <m. [We used (6.24) to derive the last identity in (6.25). To obtain 
uniformity in the bound for Rj, and in later bounds, we used the fact that ip 
is bounded.] Hence, 

log(ej) = -{ph 2 Sj - Rj + \(ph 2 Sj - Rj) 2 + ^(ph 2 Sj - Rj) 3 -•■•}. 

Similarly, since 

Pj = l-(1-Pj) 



1 - (1 - P°) - J-tfMXy/h)} 

i=l ' 

l-(l-p^)(l-ph 2 sj+Rj) 

Pf + O—Pf^h^-Rj), 
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then 

log(et) = log{l + (ptfSj - Rj)(l - P?)/P°} 



+ O p {(l-5f»(vph 2 ) 2 }, 
uniformly in 1 < j < m. It follows that 

io g (eT/e+) 

= -Uph% - Rj) + \{ph 2 Sj - R,) 2 + ^(ph'S, - R,) 3 + • • j 
( 6 - 26 ) " x-^-ly ^Sj - R 3 ) + 5) 2 »{ph 2 S 3 - R,f 



+ O p {(l-5) 3 »(vph 2 ) 2 } 
-ph 2 Sj + O p {vp 2 h 4 + (1 - 5) u uph 2 }, 



(i-5riog(e-/e;) + io g (G+) 

= -(1 - SfUptfSj - Rj) + \( P h 2 S, - R 3 ) 2 + ±(ph 2 Sj - R 3 ) 3 + • • j 



(1-5) 



2v 



+ O p {(l-5) 3 »(vph 2 ) 2 } 
= "(1 " SrS.\(ph 2 S j - R 3 ) 2 + \{ph 2 S 3 - R,) 3 + • • j 

- 1(1 - SF{ph% - Rj) 2 + O p {(l - 5?%v P h 2 ) 2 } 



.(i-sn P h 2 s 3 ) 2 +o p {(i-5nuph 2 ) 2 }, 



i 

uniformly in 1 < j < m. Using (6.22), (6.23), (6.26) and (6.27) we deduce 
that 

E°(L I X) = --(1 - 5Y(ph 2 ) 2 ]T <?f + o p {m(l - 5Y(uph 2 ) 2 }, 

3=1 

m 

var°(L \X) = {1 + o p (l)}(l - ^(p/i 2 ) 2 J^Sj + o p {m(l - 5Y(vph 2 ) 2 }. 

3=1 
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Choose h so that 

(6.28) the squared mean and the variance are of the same order. 

In particular, take {m(l — 5) u {v ph 2 ) 2 } 2 = C\m(l — 5) p '{y ph 2 ) 2 , and hence 

(6.29) m(l-5) u (uph 2 ) 2 = C 2 + o P (l) 
or equivalently, using the fact that m = Nh/u + op(l), 

(6.30) h = C 3 {(Nup 2 y\l-5)^} 1 / 5 , 

where C\,C2,C% are positive constants; C3 can be chosen arbitrarily. It 
follows that 

(6.31) ph 2 = C 2 (p/N 2 ) 1 / 5 v- 2 / 5 \ 2 N , 

where = (1 - 5)~ v . If h is given by (6.31), then uph 2 = C|(i/ 3 p/N 2 ) l / 5 X 2 N 
and therefore (6.24) follows from (3.15). 

It can be shown that, conditional on the explanatory variables, the log- 
likelihood ratio L, centred at the conditional mean and variance, is asymp- 
totically normally distributed with zero mean and unit variance. (We shall 
give a proof below.) Therefore by taking C\, and hence C3, sufficiently small, 
we can ensure that: (i) The probability of discriminating between p° and p , 
when p = p°, is bounded below 1 as N — > 00. [This follows from (6.28).] 
Similarly it can be proved that: (ii) The probability of discriminating be- 
tween p° and p 1 , when p = p 1 , is bounded below 1. Consider the asser- 
tion: (iii) p(0) — p(0) converges in probability to 0, along a subsequence, 
at a strictly faster rate than h 2 . If (iii) is true, then the error rate of the 
classifier which asserts that p = p° if p(0) is closer to p(0) than to p 1 (0), 
and p = p 1 otherwise, and converges to as N — > 00. However, properties (i) 
and (ii) show that even the optimal classifier, based on the likelihood ratio 
rule, does not enjoy this degree of accuracy, and so (iii) must be false. This 
proves (3.16). 

Finally we derive the asymptotic normality of L claimed in the previous 
paragraph. We do this using Lindeberg's central limit theorem, as follows. In 
view of the definition of L at (6.21) it is enough to prove that for each r/ > 0, 

m 

Sm(v) = a(X)- 2 Y,E°[\Y* - E(Y* I X)\ 2 a 3 (X) 2 

(6.32) x I{\Y* - E(Y* I X)\aj{X) > r)a{X)} \ X] 
^0 

in probability, where we define 

(6.33) a,(X) 2 = {log(67/e+)} 2 = (ph 2 S 3 ) 2 + o p {{v5h 2 ) 2 }, 

m 

(6.34) a{X) 2 = ^var°(Y/ | X)a j {Xf = {1 + o p (l)}C 4 m(uph 2 ) 2 (l - 5) u 

3=1 
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with C4 > and (6.33) holding uniformly in j. [We used (6.26) to obtain 
the second identities in each of (6.33) and (6.34).] Since m = hN/u + op(l), 
then, by (6.30) and (6.34), a(X) 2 — > C5 in probability, where C5 > 0. Hence, 
by (6.32), with probability converging to 1 as N — > 00, 

CeSmiv) < S N2 (r]) 

m 

xI{\Y*-E{Y*\X)\a j (X)>C 7 }\X], 

where Cq,C 7 > are constants, and C 7 depends on i]. 

Note too that, using (6.30) to obtain the second relation below, and (3.15) 
to get the last relation, we have (uph 2 ) 5 x (v5h 2 ) 5 x {(^ 3 5) 1 / 2 A r_1 (l - 
5y u } 2 -> 0. Therefore, (6.24) holds. Since \Y* - E(Y* \X)\<1, then, if 
<Tj(X) < C 7 , we have I{\Y*-E(Y* | X)\<r,{X) > C 7 } = 0. Hence, using (6.26) 
and (6.24), 

m 

S N 2(V) < - E ( Y f I I ^}^(^)^{^(^) > ^7} 

m 

= (1 - - (1 - dn^ajiXfHajiX) > C 7 } 

m 

< (1 - SyCy 2 Y, a j( x ) 4 = Opii 1 ~ S) v m{v8h 2 f} 
3=1 

= o p {m{\ - 5) v {u5h 2 ) 2 } = o p (l) 
since m(l — 5Y(v5h 2 ) 2 = C2; see (6.29). This completes the proof of (6.32). 
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SUPPLEMENTARY MATERIAL 

Additional material (DOI: 10.1214/11-AOS952SUPP; .pdf). The supple- 
mentary article contains a description of Delaigle and Meister's method, 
details for bandwidth choice, an alternative procedure for multivariate set- 
ting and unequal groups, and additional numerical results. 
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